A Probabilistic Approach to Language Structure

نویسندگان

  • Annarita Felici
  • Paul Pal
چکیده

The translation of international legal instruments requires a high degree of accuracy and consistency. With the increasing demand for multilingual texts, translation memory tools and research on parallel corpora have proved to be particularly useful for the translation of repetitive documents, as well as for those subject to an evolutive drafting process and production. Moving from this assumption, the present study aims at predicting translation equivalents with the help of a probabilistic approach. Data (1.404.723 words) consists of a multilingual parallel corpus in four languages: English, French, German and Italian. All the documents have been taken from the EU secondary legislation and include Regulations, Decisions, Directives and Recommendations, chosen between the years 2001-04. Texts are all strictly ‘normative’ and discourse is expected to be precise with minimum scope for ambiguity. The main focus is prescriptive statements, namely deontic norms (permission, obligation, prohibition) and constitutive performatives. Their formulation is highly standardized in English both within and outside the EU context. On the other hand, their expression in other languages is more vague and extensive, with potential consequences on the translation of norms. Bearing these remarks in mind, our objective is: 1) to evaluate the degree of prescriptive standardization with reference to English and the other three languages, and 2) to predict translation equivalents in the other languages under the condition that (i) English legal drafting is highly standardized, (ii) the EU and the main English drafting guidelines tend to use modal verbs in prescriptive statements (iii) text types under examination are repetitive and reusable (iv) the four EU instruments can be more or less binding. English modals are used as the main entry point and entropy analysis is exploited to measure the number of alternatives (degree of uncertainty) occurring in the other three languages. By adding knowledge to a system (e.g. a more standardized formulation), one reduces the number of alternatives (uncertainty), which leads to a decrease of entropy and to a gain of information in the expression of the norm. Although language phenomena cannot be fully described, the results of this analysis have empirically proved that given a set of conditions, certain linguistic structures are more easily predictable than other when comparing several languages. These types of analysis can foster research in language testing, evaluation, and in the development of automated translation’s tools.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

A Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing

Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...

متن کامل

A Reflection on Kristeva's Approach to the Structure of Language

Reaching out to history and subject in terms of meaning variation, Kristeva could show that language cannot simply be a Saussurean sign system. Rather, she went on to delineate that language, beyond signs, is associated with a dynamic system of signification where the ''speaking subject'' is constantly involved in processing. Julia Kristeva, a French critic, psychoanalyst, theoretician, a post-...

متن کامل

The Factor Structure of a Written English Proficiency Test: A Structural Equation Modeling Approach

The present study examined the factor structure of the University of Tehran English Proficiency Test (UTEPT) that aims to examine test takers’ knowledge of grammar, vocabulary, and reading comprehension. A Structural Equation Modelling (SEM) approach was used to analyse the responses of participants (N= 850) to a 2010 version of the test.  A higher-order model was postulated to test if the unde...

متن کامل

A COMMON FRAMEWORK FOR LATTICE-VALUED, PROBABILISTIC AND APPROACH UNIFORM (CONVERGENCE) SPACES

We develop a general framework for various lattice-valued, probabilistic and approach uniform convergence spaces. To this end, we use the concept of $s$-stratified $LM$-filter, where $L$ and $M$ are suitable frames. A stratified $LMN$-uniform convergence tower is then a family of structures indexed by a quantale $N$. For different choices of $L,M$ and $N$ we obtain the lattice-valued, probabili...

متن کامل

Fuzzy completion time for alternative stochastic networks

In this paper a network comprising alternative branching nodes with probabilistic outcomes is considered. In other words, network nodes are probabilistic with exclusive-or receiver and exclusive-or emitter. First, an analytical approach is proposed to simplify the structure of network. Then, it is assumed that the duration of activities is positive trapezoidal fuzzy number (TFN). This paper com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008